Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize GlobeSurfaceShaderSet.getShaderProgram #2346

Merged
merged 6 commits into from
Dec 24, 2014
Merged

Conversation

pjcozzi
Copy link
Contributor

@pjcozzi pjcozzi commented Dec 23, 2014

For 3K fullscreen view of Mount Everest, it was 2.3% to 2.5% of the profile (one of the top hot-spots).

Now it is ~0.07%.

@kring at this point, we might want to rename GlobeSurfaceShaderSet to something else and make it just a function.

@kring
Copy link
Member

kring commented Dec 24, 2014

The steady-state speedup is nice, but this increases tile load time a bit and it makes tiles much larger. I think I can fix both of those without impacting the steady-state performance. I'm working on it now.

@pjcozzi
Copy link
Contributor Author

pjcozzi commented Dec 24, 2014

Sounds good, thanks. Why does it increase load time? Web worker transfer?

As for tile size, my only concern would be L2 cache pollution (I actually think I am running into this elsewhere for setting uniforms) given the tile itself is tiny compared to the renderer resources.

@kring
Copy link
Member

kring commented Dec 24, 2014

Why does it increase load time?

Because the first time we see a tile, we're now building up a giant shader string and then looking that up in the shader cache. Previously we built a smaller key and looked that up. It should be possible to avoid string lookups entirely.

my only concern would be L2 cache pollution

Yeah, exactly. We end up looking at a lot more tiles during tile selection than we actually end up rendering. I don't know if the size increase ends up having a measurable impact on the performance due to cache pollution, but as long as we're optimizing this up we might as well optimize it a bit better, since it's not much harder.

@pjcozzi
Copy link
Contributor Author

pjcozzi commented Dec 24, 2014

Ah, I almost keep the older cache too, but a non-string lookup sounds good.

As for L2, I've thought about - in general - organizing objects so that the properties needed for traversal are tightly defined with a pointer to another object with all the other properties. I believe this is a common old-school game technique for fitting as many objects as possible into cache lines. I haven't tried this because we have bigger fish to fry, but keeping the tile tight for now sounds good. Just pack them into an int or two or whatever.

@kring
Copy link
Member

kring commented Dec 24, 2014

Ok this should be in good shape now. New tiles can now get an existing shader without any string concatenations or comparisons. GlobeSurfaceTile only has one new field. Verifying that a tile's existing shader is still good requires three comparisons plus a bunch of bit operations, which should be better than a bunch of comparisons. Also, enabling/disabling lighting on the fly should actually work now. I believe it was broken in the previous version of this pull request.

@kring
Copy link
Member

kring commented Dec 24, 2014

Someone else should probably merge this now that I've changed it pretty significantly.

@pjcozzi
Copy link
Contributor Author

pjcozzi commented Dec 24, 2014

All looks good, thanks! By the way, is it possible to use the squared distance instead for computeDistanceToTile? That is another hot spot.

pjcozzi added a commit that referenced this pull request Dec 24, 2014
Optimize GlobeSurfaceShaderSet.getShaderProgram
@pjcozzi pjcozzi merged commit fcc95cf into master Dec 24, 2014
@pjcozzi pjcozzi deleted the the-need-for-speed branch December 24, 2014 14:02
@kring
Copy link
Member

kring commented Dec 27, 2014

By the way, is it possible to use the squared distance instead for computeDistanceToTile?

Yes, almost certainly. Actually the entire screenSpaceError computation can be optimized quite a bit. I think Insight3D precomputes the switching-distance-squared for each tile and just compares that, rather than doing any math per frame. I'm sure we can do the same. When I originally wrote it, the profiler showed that a negligible amount of time was spent there, so I didn't spend any time optimizing it. That has changed since, maybe because WebGL implementations have gotten faster or something.

@pjcozzi
Copy link
Contributor Author

pjcozzi commented Dec 27, 2014

Want to add it to the roadmp (#526) - or just do it! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants